Named entity translation using anchor texts
نویسندگان
چکیده
This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.
منابع مشابه
A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملSentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm
We present an e cient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is e...
متن کاملImpact of translation on named-entity recognition in radiology texts
Database URL https://github.com/lasigeBioTM/MRRAD.
متن کامل